Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Apache Lucene to 9.8.0-snapshot-4373c3b #8668

Merged

Conversation

reta
Copy link
Collaborator

@reta reta commented Jul 12, 2023

Description

Update Apache Lucene to 9.8.0-snapshot-79b9664

Related Issues

N/A

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@reta reta force-pushed the update.lucene.9.8.0-snapshot-79b9664 branch from 58be329 to 04552d1 Compare July 13, 2023 13:13
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@reta reta force-pushed the update.lucene.9.8.0-snapshot-79b9664 branch from 04552d1 to 3bf1fec Compare July 13, 2023 20:13
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

baba-devv pushed a commit to baba-devv/OpenSearch that referenced this pull request Jul 29, 2023
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
willyborankin pushed a commit to opensearch-project/security that referenced this pull request Jul 31, 2023
…3069)

There are multiple PRs in core affecting the security plugin that the
security plugin needs to adapt to.

- opensearch-project/OpenSearch#7792
- opensearch-project/OpenSearch#8826
- opensearch-project/OpenSearch#8668

I am opening a Draft PR that includes a fix for the Lucene-related test
failures which was caused by
opensearch-project/OpenSearch#7792

Resolves: #3064

Signed-off-by: Craig Perkins <cwperx@amazon.com>
cwperks added a commit to cwperks/security that referenced this pull request Jul 31, 2023
…pensearch-project#3069)

There are multiple PRs in core affecting the security plugin that the
security plugin needs to adapt to.

- opensearch-project/OpenSearch#7792
- opensearch-project/OpenSearch#8826
- opensearch-project/OpenSearch#8668

I am opening a Draft PR that includes a fix for the Lucene-related test
failures which was caused by
opensearch-project/OpenSearch#7792

Resolves: opensearch-project#3064

Signed-off-by: Craig Perkins <cwperx@amazon.com>
(cherry picked from commit 08d1734)
@zhichao-aws
Copy link
Member

zhichao-aws commented Sep 12, 2023

@reta @nknize Will this lucene version change in 2.11 release? This is corresponding to an important decision-making in our feature. This is a re-invent feature and scheduled to release at 2.11.
For your information, we will introduce sparse semantic retrieval, which leverages lucene FeatureField and FeatureQuery. In older lucene version the FeatureQury performs bad because it cannot return real score upperbound to skip evaluating incompetitive docs. So we need to add a new settings to address the problem. However, in 9.8.0, lucene does many optimizations for Boolean query and we find we don't need this settings to boost the efficiency. This will affect our choice for field mapper and query clause.

@reta
Copy link
Collaborator Author

reta commented Sep 12, 2023

@reta @nknize Will this lucene version change in 2.11 release? This is corresponding to an important decision-making in our feature. This is a re-invent feature and scheduled to release at 2.11.

@zhichao-aws if 9.8.0 happens to be released before 2.11, I think we could get it in, but I don't have a date for this release sadly [1]

[1] https://github.com/apache/lucene/milestones

kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
@zhichao-aws
Copy link
Member

@reta @nknize Will this lucene version change in 2.11 release? This is corresponding to an important decision-making in our feature. This is a re-invent feature and scheduled to release at 2.11.

@zhichao-aws if 9.8.0 happens to be released before 2.11, I think we could get it in, but I don't have a date for this release sadly [1]

[1] https://github.com/apache/lucene/milestones

I see, thanks!

@msfroh
Copy link
Collaborator

msfroh commented Sep 14, 2023

@zhichao-aws For the record, @zhaih kicked off the 9.8 release discussion on the Lucene mailing list a couple of days ago: https://lists.apache.org/thread/xvhqdb0vzyfqsbpl9lvlrxxm2v4tr39z

@zhichao-aws
Copy link
Member

@zhichao-aws For the record, @zhaih kicked off the 9.8 release discussion on the Lucene mailing list a couple of days ago: https://lists.apache.org/thread/xvhqdb0vzyfqsbpl9lvlrxxm2v4tr39z

That's good to know, thanks :)

@zhichao-aws
Copy link
Member

zhichao-aws commented Sep 20, 2023

Lucene has cut the 9.8 branch today https://github.com/apache/lucene/tree/branch_9_8. Can we make it to upgrade to lucene 9.8 in 2.x before 2.11 release?

@reta
Copy link
Collaborator Author

reta commented Sep 20, 2023

Lucene has cut the 9.8 branch today https://github.com/apache/lucene/tree/branch_9_8. Can we make it to upgrade to lucene 9.8 in 2.x before 2.11 release?

Once the artifacts are published to Maven Central - for sure

brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
@CEHENKLE
Copy link
Member

Hey @reta We've got this weird little "hurry up" release locking next week (Oct 4th), and I'm worried we won't have a chance to absorb these changes with everything else going on. What would you think about holding merging this until after Oct 18th? Is there a reason we should to it sooner?

@reta
Copy link
Collaborator Author

reta commented Sep 26, 2023

What would you think about holding merging this until after Oct 18th? Is there a reason we should to it sooner?

Hey @CEHENKLE , this was not backported to 2.x (we backport only releases), so non-issue :)

@CEHENKLE
Copy link
Member

Ah coolio. Brainfail on my side. Thank you, @reta!

@CEHENKLE
Copy link
Member

Hey, it looks like neural-search may be need these changes for 2.11 (I've asked them to comment on the issue). In the meantime, @reta (or others), do you know how impactful these changes would be to other parts of the core/the plugins if we merged them?

It's hard, because if we're going to do it, we need to go as soon as possible to give time for folks to absorb changes. But if we're not going to go, I don't want to backport and cause thrash. :(

(@nknize, @andrross for more eyes)

@msfroh
Copy link
Collaborator

msfroh commented Sep 30, 2023

Technically this PR won't be the one to be backported, but rather #10276 (based on the released Lucene 9.8.0, which may be identical to the snapshot merged in this PR).

The differences likely to impact other parts of the code are probably the API changes: https://lucene.apache.org/core/9_8_0/changes/Changes.html#v9.8.0.api_changes, and changes to runtime behavior: https://lucene.apache.org/core/9_8_0/changes/Changes.html#v9.8.0.changes_in_runtime_behavior

Those changes to runtime behavior would potentially seem risky for our concurrent search work, but AFAIK, we're not seeing differences between 2.x and main -- right? (If anything, ensuring that 2.x and main behave the same there feels like that will reduce thrashing.)

@reta
Copy link
Collaborator Author

reta commented Oct 1, 2023

Technically this PR won't be the one to be backported, but rather #10276 (based on the released Lucene 9.8.0, which may be identical to the snapshot merged in this PR).

@CEHENKLE as @msfroh pointed out, Apache Lucene 9.8.0 is released so technically we are good to go for 2.x (probably make it to 2.11). We don't see any regressions so far, the changes related to concurrent search seems to be properly tracked and accommodated.

@reta reta mentioned this pull request Oct 1, 2023
6 tasks
@model-collapse
Copy link

Thanks @msfroh and @reta for pointing out. From neural-search side, Lucene 9.8 is needed so as not to overwirite the FeatureField implementation for performance issues and we don't want to deprecate in the future the parameter (for performance tuning) which we have to bring up in the condition of Lucene 9.7.

Meanwhile we are also aware of some potential risks of pushing #10276 forward:

  1. Before OpenSearch 2.11, the main branch of ml-commons is behind 2.x on some commits. Although this has fixed at the moment, we still suspect that some other repos may have the same kind of problems. The main branch should catch up with 2.x with the latest features & implementations before we back-port Lucene 9.8 to 2.x, otherwise, there will be endless conflicts to deal with.
  2. We believe that all repos have done their functional tests as well as some benchmarkings to their main branch with Lucene 9.8 snapshot and so far there was no negative feedback. But as a rigorous procedure, we suggest all the repo to reproduce their tests with the formal release artifacts of Luence 9.8 (not snapshot version) so as to validate whether there's some negative impact from the Lucene upgrade to OpenSearch. The neural search plugin is absorbing the performance improvement, we also want to see others' feedback.
  3. We'd like to see whether the change of APIs or runtime behavior will have impact to AWS hosted search service which will subscribe this upgrade in the feature. That is another topic. We shall be aware but not discuss it on github.

@reta
Copy link
Collaborator Author

reta commented Oct 2, 2023

Meanwhile we are also aware of some potential risks of pushing #10276 forward:

Thanks @model-collapse , with concerns you have raised it looks like we should not rush 9.8.0 into 2.11 branch (since the release is a few days away) but wait for cut off and plan for 2.12.

@msfroh
Copy link
Collaborator

msfroh commented Oct 2, 2023

Before OpenSearch 2.11, the main branch of ml-commons is behind 2.x on some commits. Although this has fixed at the moment, we still suspect that some other repos may have the same kind of problems. The main branch should catch up with 2.x with the latest features & implementations before we back-port Lucene 9.8 to 2.x, otherwise, there will be endless conflicts to deal with.

My understanding is that ml-commons is unusual in terms of committing on 2.x directly. Other repos generally do the right thing: commit on main and backport to 2.x.

@Pallavi-AWS
Copy link
Member

@model-collapse do you agree with below assessment from @reta? With the tight timelines for 2.11 release, ideally we don't want to force a Lucene upgrade.

Meanwhile we are also aware of some potential risks of pushing #10276 forward:

Thanks @model-collapse , with concerns you have raised it looks like we should not rush 9.8.0 into 2.11 branch (since the release is a few days away) but wait for cut off and plan for 2.12.

@heemin32
Copy link
Contributor

heemin32 commented Oct 2, 2023

I agree with @msfroh that most repo will follow same pattern of commit to main and backport to 2.x in general. Therefore, I don't see a much risk on point 1 and point 2 especially we have been using lucene 9.8 snapshot in main branch for a while.

For point 3, is there any specific action item we generally take before lucene version bump? Shouldn't it be accessed and addressed during AWS hosted search service release in a separate timeline?

@krishna-ggk
Copy link

I agree with @msfroh that most repo will follow same pattern of commit to main and backport to 2.x in general. Therefore, I don't see a much risk on point 1 and point 2 especially we have been using lucene 9.8 snapshot in main branch for a while.

IMHO, I don't think we can equate testing of main to be equal to testing for 2.x. Just a simple diff (without any contextual/functional analysis) shows 5k+ files, 1400+ commits and 70k+ lines. Agree with assessment, lucene upgrade nearing 2.11 release window does feel risky.

shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants